Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

نویسندگان

  • Reagan Mozer
  • Luke Miratrix
  • Aaron Russell Kaufman
  • L. Jason Anastasopoulos
چکیده

How should one perform matching in observational studies when the units are text documents? The lack of randomized assignment of documents into treatment and control groups may lead to systematic differences between groups on high-dimensional and latent features of text such as topical content and sentiment. Standard balance metrics, used to measure the quality of a matching method, fail in this setting. We decompose text matching methods into two parts: (1) a text representation, and (2) a distance metric, and present a framework for measuring the quality of text matches experimentally using human subjects. We consider 28 potential methods, and find that representing text as term vectors and matching on cosine distance significantly outperform alternative representations and distance metrics. We apply our chosen method to a substantive debate in the study of media bias using a novel data set of front page news articles from thirteen news sources. Media bias is composed of topic selection bias and presentation bias; using our matching method to control for topic selection, we find that both components contribute significantly to media bias, though some news sources rely on one component more than the other.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Assisted History Matching Workflow and its Application in a Full Field Reservoir Simulation Model

The significant increase in using reservoir simulation models poses significant challenges in the design and calibration of models. Moreover, conventional model calibration, history matching, is usually performed using a trial and error process of adjusting model parameters until a satisfactory match is obtained. In addition, history matching is an inverse problem, and hence it may have non-uni...

متن کامل

Evaluation of Similarity Measures for Template Matching

Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...

متن کامل

A procedure for Web Service Selection Using WS-Policy Semantic Matching

In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

الگوریتم انطباق مرزی ترکیبی و سریع برای اختفای خطای زمانی داده‌های ویدئویی

Despite data resilient methods against error that are applied on video data in transmitter side, occurringerror along video data transferring for communication channels is inevitable. Error concealment is a useful method for improving the quality of damaged videos in receiver side. In this paper, a fast and hybrid boundary matching algorithm is presented for more accurate estimating of damaged ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.00644  شماره 

صفحات  -

تاریخ انتشار 2018